Statistique et Big Data Analytics; Volumétrie, L'Attaque des Clones

نویسنده

  • Philippe Besse
چکیده

This article assumes acquired the skills and expertise of a statistician in unsupervised (NMF, k-means, SVD) and supervised learning (regression, CART, random forest). What skills and knowledge do the statistician must acquire it to reach the "Volume" scale of big data ? After a quick overview of the different strategies available and especially those imposed by Hadoop, algorithms of some available learning methods are outlined to understand how they are adapted to high stresses of Map-Reduce functionalities. The next step will probably be to rewrite them using the R like matricial language which is developped by the communitiesMahout, Spark and Scala. ∗Université de Toulouse – INSA, Institut de Mathématiques, UMR CNRS 5219 †INRA – UR875 MIA-T 1 ar X iv :1 40 5. 66 76 v2 [ st at .O T ] 5 O ct 2 01 4 FIGURE 1 – Armée de clones, baies de serveurs, alignés par milliers dans le hangar d’un centre de données de Google.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification incrémentale supervisée : un panel introductif

Résumé. Les dix dernières années ont été témoin du grand progrès réalisé dans le domaine de l’apprentissage statistique et de la fouille de données. Il est possible à présent de trouver des algorithmes d’apprentissage efficaces et automatiques. Historiquement les méthodes d’apprentissage faisaient l’hypothèse que toutes les données étaient disponibles et pouvaient être chargées en mémoire pour ...

متن کامل

Big Data - Retour vers le Futur 3; De Statisticien à Data Scientist

L'évolution rapide des systèmes d'information gérant des données de plus en plus volumineuses a causé de profonds changements de paradigme dans le travail de statisticien, devenant successivement prospecteur de données, bio-informaticien et maintenant data scientist. Sans souci d'exhaustivité et après avoir illustré ces mutations successives, cet article présente brièvement les nouvelles questi...

متن کامل

Big Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions

The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...

متن کامل

Cryptanalyse de Achterbahn-128/80

This paper presents two attacks against Achterbahn-128/80, the last version of one of the stream cipher proposals in the eSTREAM project. The attack against the 80-bit variant, Achterbahn-80, has complexity 2^{56.32}. The attack against Achterbahn-128 requires 2^{75.4} operations and 2^{61} keystream bits. These attacks are based on an improvement of the attack due to Hell and Johansson against...

متن کامل

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1405.6676  شماره 

صفحات  -

تاریخ انتشار 2014